Skip to content

extend router replay#2703

Open
faresobeid wants to merge 1 commit into
mainfrom
router_replay_extend
Open

extend router replay#2703
faresobeid wants to merge 1 commit into
mainfrom
router_replay_extend

Conversation

@faresobeid
Copy link
Copy Markdown
Contributor

@faresobeid faresobeid commented Jun 4, 2026

Replayed experts are only kept if the trainers score on them is above the score of its weakest expert * ratio


Note

Medium Risk
Changes MoE expert selection on the training forward path when filtering is enabled, which can alter gradients and load balancing; default-off config limits blast radius.

Overview
Adds optional plausibility filtering for MoE router replay during RL training. With trainer.router_replay_score_threshold_ratio set above 0, each inference-replayed expert is kept only if the trainer router’s gate score for that expert is at least that fraction of the trainer’s weakest top-k score for the token; rejected slots are backfilled from the trainer’s own top-k picks. The default 0 leaves behavior unchanged (strict replay of inference routing).

Wiring: new trainer config field, configure_router_replay_filter applied at model init when router replay is on, and logic in TokenChoiceTopKRouter plus docs for the inference/trainer TOML knobs. torch.histc inputs are cast to float where needed.

Reviewed by Cursor Bugbot for commit 7e7f36f. Bugbot is set up for automated code reviews on this repo. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant